How To: Crop type classification for Austria

This example notebook shows the steps towards constructing an automated machine learning pipeline for crop type identification in an area of interest in Austria. Along the pipeline, two different approaches are applied and compared. The first one, the LightGBM, represents a state-of-the-art machine learning algorithm. The second one is a Temporal Convolutional Neural Network architecture from the field of deep learning. The prediction is performed on a time-series of Sentinel-2 scenes from 2018. The example notebook will lead you through the whole process of creating the pipeline, with details provided at each step (see Overview).

Before start

Enjoying the functionality of eo-learn and the simplicity of this example workflow is preceded by the unavoidable setup of an adequate working environment. But trust us, it's worth it! And we'll take you by the hand.

Requirements

Sentinel Hub account

To run the example you'll need a Sentinel Hub account. If you do not have one yet, you can create a free trial account at the Sentinel Hub webpage. If you are a researcher you can even apply for a free non-commercial account at the ESA OSEO page.

Once you have the account set up, login to Sentinel Hub Configurator. By default you will already have the default configuration with an instance ID (alpha numeric code of length 36). For this tutorial we recommend that you create a new configuration (Add new configuration) and set the configuration to be based on Python scripts template. Such configuration will already contain all layers used in a more general Land Use/ Land Cover (LULC) example which are adopted for this example. Otherwise you will have to define the layers for your configuration yourself.

One layer you have to define yourself is your "MY-S2-L2A-BANDS" layer. Therefore you:

  • log in to your Sentinel Hub account
  • go to Configuration Utility and access your newly created LULC configuration
  • here you choose + Add new layer
  • your Layer name is MY-S2-L2A-BANDS
  • in red letters you are requested to ! Please select predefined product or enter your processing script - so you better do...
  • to set your custom script you copy/paste return [B02,B03,B04,B05,B06,B07,B08,B8A,B11,B12] into the Custom script editor and click </> Set Custom Script
  • You just told Sentinel Hub which bands you want to download in the following. Now, go on and Save your own layer

After you have prepared the configuration please put configuration's instance ID into sentinelhub package's configuration file following the configuration instructions.

Sentinel Hub Python package

The Sentinel Hub Python package allows users to make OGC (WMS and WCS) web requests to download and process satellite images within your Python scripts. It supports Sentinel-2 L1C and L2A, Sentinel-1, Landsat 8, MODIS and DEM data source.

eo-learn library

Between the acquisition of a satellite image and actionable information, there is a large processing effort. eo-learn as a collection of modular Python sub-packages allows easy and quick pro-cessing of spatio-temporal data to prototype, build and automate these required large scale EO workflows for AOIs of any size. It also directly enables the application of state-of-the-art tools for computer vision, machine learning and deep learning packages in Python to the data. Especially for non-experts to the field of remote sensing and machine learning it makes extraction of valuable information from satellite imagery easier and more comfortable.

Overview

With the help of the eo-learn library, the entire classification process can be executed in 4 processing blocks, i.e. EOWorkflows.

  1. Ground truth data
  2. EO data
  3. Feature engineering - Crop type grouping - Sampling
  4. Prediction

In more detail the notebook is structured as follows:

I. Imports

II. Configurations

Part I

  1. BBox-Splitter
    • Plot AOI and give the extent
    • Create BBoxes
    • Visualize the selection
  2. Add ground truth data
    • Create EOPatches and add LPIS + area ratio
  3. Add EO data
    • Choose EO features
    • Clean EOPatch list
  4. Feature/ label engineering and Sampling
    • Data visualization
    • Resampling, Interpolation, LPIS data preparation
    • Sampling

Part II

  1. Prediction
    • Set up and train LightGBM model
    • Set up and train TempCNN model
    • Model validation and evaluation
    • Prediction
    • Visualization of the results
  2. Next steps

Now, after the setup you are curious what is next and can't wait to get your hands dirty? Well, let's get started!

Imports

Lets start with some necessary imports.

In [2]:
# set module directory to system path
import sys, os
MAIN_FOLDER = os.getcwd()
import_path = os.path.join(MAIN_FOLDER, 'Tasks')
if import_path not in sys.path:
    sys.path.append(import_path)
In [3]:
# Built-in modules
import math
import shutil
import itertools
from datetime import timedelta


# Basics of Python data handling and visualization
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from shapely.geometry import Polygon
from tqdm.notebook import tqdm


# Imports from eo-learn, sentinelhub-py, and perceptive-sentinel
from sentinelhub import CRS, BBoxSplitter, MimeType

from eolearn.core import LinearWorkflow, FeatureType, SaveTask, OverwritePermission, LoadTask
from eolearn.core import EOPatch, EOTask, CreateEOPatchTask, ZipFeatureTask, MapFeatureTask
from eolearn.geometry import VectorToRaster, ErosionTask
from eolearn.io import S2L2AWCSInput, AddSen2CorClassificationFeature, ExportToTiff
from eolearn.mask import get_s2_pixel_cloud_detector, AddCloudMaskTask, AddValidDataMaskTask
from eolearn.features import SimpleFilterTask, LinearInterpolation
from eolearn.features import NormalizedDifferenceIndexTask, EuclideanNormTask

# Machine learning
import lightgbm as lgb
import joblib
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn import metrics

# Deep Learning
from keras.models import Sequential
from keras.layers import Dense, Flatten, Dropout
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.utils import to_categorical

# Notebook specific classes and functions
from CropTypeClassTasks import CleanLPIS, GroupLPIS, LPISCLASS, ConcatenateData, SamplingTaskTask
from CropTypeClassTasks import train_test_split_eopatches, train_test_split_eopatch
from CropTypeClassTasks import plot_confusion_matrix, PredictPatch, AddAreaRatio, FixLPIS, masking
from CropTypeClassTasks import AddGeopediaVectorFeature, Sen2CorValidData, ValidDataFractionPredicate
c:\programdata\anaconda3\lib\site-packages\sklearn\externals\joblib\__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
Using TensorFlow backend.
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\google\protobuf\descriptor.py:47: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from google.protobuf.pyext import _message
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\util\nest.py:824: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  _pywrap_tensorflow.RegisterType("Sequence", _collections.Sequence)
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\training\checkpointable\util.py:448: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class _ObjectIdentitySet(collections.MutableSet):
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\keras\callbacks.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable

Configurations

In this part you can define your configurations. The basic configurations are set for an example running smoothly.

In [4]:
# define in- and output folders
output_path = os.path.join(MAIN_FOLDER, 'Output')
general_data_path = os.path.join(MAIN_FOLDER, 'GeneralData')
patch_path = os.path.join(MAIN_FOLDER, 'Output', 'EOPatches')
thresLPIS_path = os.path.join(MAIN_FOLDER, 'Output', 'EOPatches_Low_LPIS_Thres')
samples_path = os.path.join(MAIN_FOLDER, 'Output', 'Samples')
models_path = os.path.join(MAIN_FOLDER, 'Output', 'Models')
predictions_path = os.path.join(MAIN_FOLDER, 'Output', 'Predictions')

# For reference colormap
lpisclass_cmap = mpl.colors.ListedColormap([entry.color for entry in LPISCLASS])
lpisclass_norm = mpl.colors.BoundaryNorm(np.arange(-0.5, 26, 1), lpisclass_cmap.N)
class_names = [entry.class_name for entry in LPISCLASS]
class_ids = [entry.id for entry in LPISCLASS]


### 1. BBox-Splitter
##  Plot AOI and give extent
INPUT_FILE = os.path.join(general_data_path, 'Area_AOI.geojson') # Geojson or Shapefile of the area of interest
austria = os.path.join(general_data_path, 'Area_Austria.geojson') # Geojson of austrian national borders
crs = CRS.UTM_33N # wanted coordinate System of the AOI


### 2. Add ground truth data
## Create EOPatches and add LPIS + area ratio
year = '2018' # year of interest
layerID_dict = {'2018': 2647, '2017': 2034, '2016': 2033} # Layer IDs of Geopedia Layer
layerID = layerID_dict[year] # Layer ID for Austria of year set by "year"
patch_list = os.listdir(patch_path) # List of created EOPatches names


### 3. Add EO data
## Choose EO features
maxcloud = 0.8 # maximum cloudcoverage of sentinel tiles used for download
datafrac = 0.7 # keep only frames with valid data fraction over x%
## Clean EOPatch list
lpis_thres = 0.13 # Patches with less than x% of LPIS coverage are excluded in the following
## Add EO data
time_interval = [f'{year}-01-01', f'{year}-09-30'] # the start and end date for downloading satellite data


### 4. Feature and label engineering
## Feature concatenation, interpolation and LPIS data preparation
day_range = 8 # the realtime range of valid satellite images is resampled to a x day equidistant range 
## Prepare LPIS data
grouping_id = 'basic' # define grouping id (has to be identical to endings of the two grouping files)
# File that contains LPIS Crop ID and wanted groups - Colums of shape: CROP_ID, english, slovenian, latin, GROUP_1
lpis_to_group_file = os.path.join(general_data_path, 'at_lpis_{}_crop_to_group_mapping_{}.csv'.format(year, grouping_id))
# File that contains the wanted groups and their ID - Colums of shape: GROUP_1, GROUP_1_ID
crop_group_file = os.path.join(general_data_path, 'crop_group_1_definition_{}.csv'.format(grouping_id))


### 5. Sampling
## Sampling per EOPatch
pixel_thres = 1000 # Pixel thresold necessary for a class to be considered in sampling
samp_class = 500 # take x samples per class per EOPatch
## Combine samples and split into train and test data
test_ratio = 4 # take every xth patch for testing
features_dict = 'FEATURES_SAMPLED' # name of the dictionary where the sample features are saved
labels_dict = 'LPIS_class_{}_ERODED_SAMPLED'.format(grouping_id) # name of the dictionary where the sample labels are saved

1. From AOI to BBox

Plot AOI and give extent

Spotlight on your "INPUT_FILE" configuration! This is where you have the possibility to easily adapt the workflow to your needs. Take your pick and replace the AOI file in the General data folder. Either shapefile or geojson formatted version of your AOI is split into smaller patches by eo-learn. The total number of patches depends on the AOIs size. Automated splitting is supposed to create patches of size 10 x 10 km.

In [4]:
# define Area of interest
aoi = gpd.read_file(INPUT_FILE) # read AOI file

aoi_shape = aoi.geometry.values[-1] # get aoi shape

aoi = aoi.to_crs(crs={'init': CRS.ogc_string(crs)}) # assign coordinate system to 

# define BBox-Splitter split values
ShapeVal_a = round(aoi_shape.bounds[2] - aoi_shape.bounds[0])
ShapeVal_b = round(aoi_shape.bounds[3] - aoi_shape.bounds[1])

SplitVal_a = max(1, int(ShapeVal_a/1e4))
SplitVal_b = max(1, int(ShapeVal_b/1e4))

# Give extent of AOI + grid count and plot AOI
print('The extent of the AOI is {}m x {}m, so it is split into a grid of {} x {}.'.format(ShapeVal_a, 
                                                                                          ShapeVal_b, 
                                                                                          SplitVal_a, 
                                                                                          SplitVal_b))

aoi.plot()
plt.axis('off');
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
The extent of the AOI is 20664m x 20561m, so it is split into a grid of 2 x 2.

Create BBoxes

The simple patch polygons are transformed into bounding boxes suitable for serving as geometrical EOPatch frame.

In [5]:
# split area of interest into an appropriate number of BBoxes
bbox_splitter = BBoxSplitter([aoi_shape], crs, (SplitVal_a, SplitVal_b))

bbox_list = np.array(bbox_splitter.get_bbox_list()) # get list of BBox geometries
info_list = np.array(bbox_splitter.get_info_list()) # get list of x (column) and y(row) indices

print('Each bounding box also has some info how it was created.\nExample:\n'
      'bbox: {}\ninfo: {}\n'.format(bbox_list[0].__repr__(), info_list[0]))
Each bounding box also has some info how it was created.
Example:
bbox: BBox(((420862.3179607267, 5329537.336315366), (431194.28800678457, 5339817.792378783)), crs=CRS('32633'))
info: {'parent_bbox': BBox(((420862.3179607267, 5329537.336315366), (441526.25805284246, 5350098.248442201)), crs=CRS('32633')), 'index_x': 0, 'index_y': 0}

Visualize the selection

First visualize the GeoDataFrame of the bounding boxes

In [6]:
# create GeoDataFrame of BBoxes
geometry = [Polygon(bbox.get_polygon()) for bbox in bbox_list] # get geometry from bbox_list for creating GeoSeries
idxs_x = [info['index_x'] for info in info_list] # get column index for naming EOPatch
idxs_y = [info['index_y'] for info in info_list] # get row index for naming EOPatch

gdf = gpd.GeoDataFrame({'index_x': idxs_x, 'index_y': idxs_y},
                       crs={'init': CRS.ogc_string(crs)},
                       geometry=geometry)

shapefile_name = os.path.join(output_path, 'BBoxes.shp')
gdf.to_file(shapefile_name)

gdf.head()
Out[6]:
index_x index_y geometry
0 0 0 POLYGON ((420862.3179607267 5329537.336315366,...
1 0 1 POLYGON ((420862.3179607267 5339817.792378783,...
2 1 0 POLYGON ((431194.2880067846 5329537.336315366,...
3 1 1 POLYGON ((431194.2880067846 5339817.792378783,...

Second visualize the split AOI with reference to Austrian national borders

In [7]:
# Plot AOI overview
austria_gdf = gpd.read_file(austria)

fontdict = {'family': 'monospace', 'weight': 'normal', 'size': 11}

# if bboxes have all same size, estimate offset
xl, yl, xu, yu = gdf.geometry[0].bounds
xoff, yoff = (xu - xl) / 3, (yu - yl) / 5

# main figure
fig, ax = plt.subplots(figsize=(20, 20))
gdf.plot(ax=ax,facecolor='w',edgecolor='r',alpha=0.5,linewidth=5)
aoi.plot(ax=ax, facecolor='w',edgecolor='k',alpha=0.5)
austria_gdf.plot(ax=ax, facecolor='w',edgecolor='b',alpha=0.5)
ax.set_title('Test Area Splitted');
plt.axis('off')

# sub figure
a = plt.axes([0.2, 0.6, .2, .2])
gdf.plot(ax=a, facecolor='w',edgecolor='r',alpha=0.5, linewidth=3)
aoi.plot(ax=a, facecolor='w',edgecolor='k',alpha=0.5, linewidth=3)
plt.xticks([])
plt.yticks([])
Out[7]:
([], <a list of 0 Text yticklabel objects>)

2. Add ground truth data

Create EOPatches and add LPIS data + area ratio

Now it's time to create EOPatches and start filling them with data.

Add data

  • At first you transform your basic geometric frames into proper EOPatches. You can then fill these handy data containers endlessly.
  • As a start you add your ground truth data that is later used as a reference to validate your prediction results. Here, you use Austrian LPIS data containing agricultural information on the field-level. In the case of this example you download your 2018 data in vector format automatically from Geopedia using Sentinel-Hub tasks. For further observation you can also download the complete and free dataset for Austria here.
  • Additionally a ratio value is added showing the percentage of the agricultural area in the respective EOPatch. The importance of this ratio will become apparent in the following steps.

An EOPatch is created and manipulated using EOTasks. Due to the potentially large number of EOPatches, automation of the processing pipeline is absolutely crucial. Therefore EOTasks are chained in an EOWorkflow. In this example the final workflow is executed on all patches, which are saved to the specified directory.

Set up your 1. EOWorkflow - Ground truth data

The EOTasks need to be put in some order and executed one by one. This can be achieved by manually executing the tasks, or more conveniently, defining an EOWorkflow which does this for you. An EOWorkflow can be linear or more complex, but it should be acyclic. Here we will use the linear case of the EOWorkflow, available as LinearWorkflow

In [8]:
# TASK FOR CREATING EOPATCH
create = CreateEOPatchTask()

# TASK FOR ADDING LPIS DATA FROM GEOPEDIA
# here you can choose the year of interest
# also you have to set the corresponding Geopedialayer-ID
add_lpis = AddGeopediaVectorFeature((FeatureType.VECTOR_TIMELESS, 'LPIS_{}'.format(year)),
                                        layer=layerID, year_filter=None, drop_duplicates=True)

# TASK FOR ADDING AN AREA RATIO
# the area ratio indicates the EOPatches proportion of LPIS coverage
area_ratio = AddAreaRatio((FeatureType.VECTOR_TIMELESS, 'LPIS_{}'.format(year)),
                              (FeatureType.SCALAR_TIMELESS, 'FIELD_AREA_RATIO'))

# TASK FOR SAVING TO OUTPUT
save = SaveTask(patch_path, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)

# define the workflow
workflow = LinearWorkflow(create, 
                          add_lpis, 
                          area_ratio, 
                          save)

Run your first EOWorkflow

In [9]:
# execute workflow
pbar = tqdm(total=len(bbox_list))
for idx, bbox in enumerate(bbox_list):
    bbox = bbox_splitter.bbox_list[idx]
    info = bbox_splitter.info_list[idx]
    patch_name = f'eopatch_{idx}_col-{info["index_x"]}_row-{info["index_y"]}'
    workflow.execute({create:{'bbox':bbox}, save:{'eopatch_folder':patch_name}})
    pbar.update(1)
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))

Visualize the added vector data for one example EOPatch

In [10]:
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name))

# plot vector data
print('Plotting LPIS vector data of eopatch: {}'.format(eopatch_name))
fig, ax = plt.subplots(figsize=(20, 20))
LPIS = eopatch.vector_timeless['LPIS_{}'.format(year)]
LPIS.plot(column='SNAR_BEZEI', axes=ax, categorical=True)
ax.set_aspect('auto')
ax.set_xticks(ticks=[])
ax.set_yticks(ticks=[])

del eopatch
Plotting LPIS vector data of eopatch: eopatch_0_col-0_row-0
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\geopandas\plotting.py:400: FutureWarning: 'axes' is deprecated, please use 'ax' instead (for consistency with pandas)
  "(for consistency with pandas)", FutureWarning)

As you can see, the crop types in your AOI are very diverse. Each colour stands for one of the over 200 LPIS classes.

3. Add EO data

Choose EO features

Now, it's time to add Sentinel-2 data to the EOPatches. You are lucky to be using eo-learn, as this is simply done by setting up another single EOWorkflow including only a single EOTask for adding your satellite images. The remaining tasks allow you to create extensive valid data masks and useful indices using a ridiculously small amount of code.

In detail you add:

  • L2A bands [B02,B03,B04,B05,B06,B07,B08,B8A,B11,B12]
  • Sen2cor's scene classification map and snow probability map
  • SentinelHub's cloud probability map and cloud mask
  • A mask of validity, based on acquired data from Sentinel and cloud coverage.

    1. IS_DATA == True
    2. CLOUD_MASK == 0 (1 indicates that pixel was identified to be covered with cloud)
  • Filter out time frames with < 70 % valid coverage (no clouds)
  • Calculate and add NDVI, NDWI, NORM for helping the algorithm to detect relationships between the spectral bands.

Set up your 2. EOWorkflow - EO data

In [11]:
# TASK TO LOAD EXISTING EOPATCH
load = LoadTask(patch_path)

# TASK TO ADD SENTINEL 2 LEVEL 2A DATA
# Here also a simple filter of cloudy scenes is done. A detailed cloud cover 
# detection is performed within the next steps
add_l2a = S2L2AWCSInput(layer='MY-S2-L2A-BANDS', 
                        resx='10m', 
                        resy='10m', 
                        maxcc=maxcloud, 
                        time_difference=timedelta(hours=2),
                        raise_download_errors=False)

# TASK TO ADD SEN2COR'S SCENE
# add sen2cor's scene classification map and snow probability map
add_scl = AddSen2CorClassificationFeature(sen2cor_classification='SCL', 
                                          layer='TRUE-COLOR-S2-L2A',
                                          image_format=MimeType.TIFF_d32f, 
                                          raise_download_errors=False)

# TASK FOR CLOUD INFO
# cloud detection is performed at 160m resolution
# and the resulting cloud probability map and mask
# are scaled to EOPatch's resolution
cloud_classifier = get_s2_pixel_cloud_detector(average_over=2, dilation_size=1, all_bands=False)
add_clm = AddCloudMaskTask(cloud_classifier,
                           'BANDS-S2CLOUDLESS',
                           cm_size_y='160m',
                           cm_size_x='160m',
                           cmask_feature='CLM')

# create valid data masks
scl_valid_classes = [2, 4, 5, 6, 7]

# TASKs FOR ADDING L2A and L1C VALID DATA MASKS
# convert cloudmask to validmask
add_clm_valid =MapFeatureTask((FeatureType.MASK, 'CLM'),
                             (FeatureType.MASK, 'CLM_VALID'),
                             np.logical_not)
# combine IS_DATA and CLM_VALID
add_l1c_valmask = ZipFeatureTask({FeatureType.MASK: ['IS_DATA', 'CLM_VALID']}, 
                            (FeatureType.MASK, 'L1C_VALID'),
                            np.logical_and)
# combine IS_DATA and SCL (using an erosion radius of 6 and a dilation radius of 22 pixel for SCL classes)
add_l2a_valmask = AddValidDataMaskTask(Sen2CorValidData(scl_valid_classes, 6, 22), 'L2A_VALID')
# combine all validmasks
add_valmask = ZipFeatureTask({FeatureType.MASK: ['L1C_VALID', 'L2A_VALID']}, 
                            (FeatureType.MASK, 'VALID_DATA'),
                            np.logical_and)

# TASK TO FILTER OUT SCENES INCLUDING TOO MANY UNVALID PIXEL
# keep frames with > x % valid coverage
valid_data_predicate = ValidDataFractionPredicate(datafrac)
filter_task = SimpleFilterTask((FeatureType.MASK, 'VALID_DATA'), valid_data_predicate)

# TASK FOR CALCULATING INDICES
# NDVI = Normalized Difference Vegetation Index
# NDWI = Normalized Difference Water Index
# NORM = Euclidean Norm
ndvi = NormalizedDifferenceIndexTask((FeatureType.DATA, 'MY-S2-L2A-BANDS'), 
                                     (FeatureType.DATA, 'NDVI'),
                                     [6, 2])
ndwi = NormalizedDifferenceIndexTask((FeatureType.DATA, 'MY-S2-L2A-BANDS'), 
                                     (FeatureType.DATA, 'NDWI'),
                                     [1, 6])
norm = EuclideanNormTask((FeatureType.DATA, 'MY-S2-L2A-BANDS'), (FeatureType.DATA, 'NORM'))

# TASK FOR SAVING TO OUTPUT
save = SaveTask(patch_path, compress_level=1, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)

workflow = LinearWorkflow(load, 
                          add_l2a, 
                          add_scl, 
                          add_clm, 
                          add_clm_valid, 
                          add_l1c_valmask, 
                          add_l2a_valmask, 
                          add_valmask, 
                          filter_task, 
                          ndvi, 
                          ndwi, 
                          norm, 
                          save)
c:\programdata\anaconda3\lib\site-packages\sklearn\utils\deprecation.py:144: FutureWarning: The sklearn.preprocessing.label module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.preprocessing. Anything that cannot be imported from sklearn.preprocessing is now part of the private API.
  warnings.warn(message, FutureWarning)

Clean EOPatch list

Most likely, along with this innovative workflow, you are pushing humankind forward with other processes on your machine. Therefore you do not want to waste your ressources on EOPatches containing very little agricultural area. Before running your already set up EOWorkflow, clean your EOPatch list.

Remember the earlier calculated LPIS ratio? From here on you only keep EOPatches containing more than 13% agricultural area. The irrelevant ones are moved to the sidetrack. If you want to use EOPatches more extensively covered with agricultural area simply increase your "lpis_thres" configuration.

In [12]:
# in GeoDataFrame label patches with certain thresold either as to do (1) or not to do (0)
gdf[f'far{year}'] = -2.0
for idx, row in gdf.iterrows():
    patch_name = os.path.join(patch_path, f'eopatch_{idx}_col-{row.index_x}_row-{row.index_y}')
    eop = EOPatch.load(str(patch_name), lazy_loading=True)
    gdf.loc[idx, f'far{year}'] = eop.scalar_timeless['FIELD_AREA_RATIO'][0]
    
gdf[f'todo{year}'] = (gdf[f'far{year}'] > lpis_thres) * 1
gdf.to_file(shapefile_name)

# move EOPatch folders with LPIS coverage beneath thresold into seperate folder
move = []
patch_list_delete = gpd.read_file(shapefile_name)
patch_list_delete = patch_list_delete[patch_list_delete[f'todo{year}'] == 0] # identify EOPatches with insufficient LPIS thresold

# create list including names of the identified EOPatches
for idx in patch_list_delete.index:
    info = bbox_splitter.info_list[idx]
    patch_name = f'eopatch_{idx}_col-{info["index_x"]}_row-{info["index_y"]}'
    move.append(patch_name)
    
print('EOPatches moved to sidetrack: ' + str([patch_name for patch_name in move]))

# move identified EOPatches to alternative folder
for patch_name in move:
    shutil.move(os.path.join(patch_path, patch_name), os.path.join(thresLPIS_path, patch_name))

patch_list = os.listdir(patch_path) # update patch_list
EOPatches moved to sidetrack: []

Run second EOWorkflow

  • Set up EOWorkflow? Check!
  • Ignored irrelevant EOPatches? Check!

Then go ahead and run your EOWorkflow on the basis of your "time_interval" configuration!

In [13]:
# execute workflow and save the names of those that failed
failed = []

pbar = tqdm(total=len(patch_list))
for patch_name in patch_list:
    # add EO data if possible
    try:
        workflow.execute({load: {'eopatch_folder': patch_name},
                          add_l2a: {'time_interval': time_interval},
                          save: {'eopatch_folder': patch_name}})
    # append EOPatch name to list for further investigation
    except Exception as ex:
        print(f'Failed {patch_name} with {ex}')
        failed.append(patch_name)

    pbar.update()
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\eolearn\features\bands_extraction.py:86: RuntimeWarning: invalid value encountered in true_divide
  ndi = (band_a - band_b + self.acorvi_constant) / (band_a + band_b + self.acorvi_constant)
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\eolearn\features\bands_extraction.py:86: RuntimeWarning: invalid value encountered in true_divide
  ndi = (band_a - band_b + self.acorvi_constant) / (band_a + band_b + self.acorvi_constant)
Failed eopatch_1_col-0_row-1 with During execution of task SaveTask: [Errno 28] No space left on device
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\eolearn\features\bands_extraction.py:86: RuntimeWarning: invalid value encountered in true_divide
  ndi = (band_a - band_b + self.acorvi_constant) / (band_a + band_b + self.acorvi_constant)
Failed eopatch_2_col-1_row-0 with During execution of task SaveTask: [Errno 28] No space left on device
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\eolearn\features\bands_extraction.py:86: RuntimeWarning: invalid value encountered in true_divide
  ndi = (band_a - band_b + self.acorvi_constant) / (band_a + band_b + self.acorvi_constant)
Failed eopatch_3_col-1_row-1 with During execution of task SaveTask: [Errno 28] No space left on device

4. Feature/ label engineering and Sampling

The classifier you are using for the following prediction is very picky when it comes to the format of the input data. To feed your thoughtfully compiled data to the algorithm it needs some preparation.

Data visualization

Now, after all necessary data is added let's load a single EOPatch and look at the structure. By executing

EOPatch.load(os.path.join(patch_path, 'eopatch_0_col-0_row-0')

You obtain the following structure:

EOPatch(
  data: {
    MY-S2-L2A-BANDS: numpy.ndarray(shape=(39, 1028, 1033, 10), dtype=float32)
    NDVI: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=float32)
    NDWI: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=float32)
    NORM: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=float32)
  }
  mask: {
    CLM: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
    CLM_VALID: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
    IS_DATA: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
    L1C_VALID: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
    L2A_VALID: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
    SCL: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=int32)
    VALID_DATA: numpy.ndarray(shape=(39, 1028, 1033, 1), dtype=bool)
  }
  scalar: {}
  label: {}
  vector: {}
  data_timeless: {}
  mask_timeless: {}
  scalar_timeless: {
    FIELD_AREA_RATIO: numpy.ndarray(shape=(1,), dtype=float64)
  }
  label_timeless: {}
  vector_timeless: {
    LPIS_2018: geopandas.GeoDataFrame(columns=['geometry', 'FS_KENNUNG', 'SL_FLAECHE', 'ID', 'SNAR_BEZEI', 'DateImported'], length=4091, crs=EPSG:32633)
  }
  meta_info: {
    maxcc: 0.8
    service_type: 'wcs'
    size_x: '10m'
    size_y: '10m'
    time_difference: datetime.timedelta(seconds=7200)
    time_interval: ['2018-01-01', '2018-09-30']
  }
  bbox: BBox(((420862.3179607267, 5329537.336315366), (431194.28800678457, 5339817.792378783)), crs=CRS('32633'))
  timestamp: [datetime.datetime(2018, 1, 6, 10, 4, 51), ..., datetime.datetime(2018, 9, 28, 10, 0, 24)], length=39
)

As you can see your EO data and indices are stored in data.FeatureType your valid data masks in mask.FeatureType and your ground truth data in vector_timeless.FeatureType

It is possible to access various EOPatch content via calls like:

eopatch.timestamp
eopatch.vector_timeless['LPIS_2018']
eopatch.data['NDVI'][0]
eopatch.data['MY-S2-L2A-BANDS'][5][..., [3, 2, 1]]

Plot RGB image

In order to get a quick and realistic overview of your AOI you plot the true color image of one EOPatch

In [14]:
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)

fig, ax = plt.subplots(figsize=(20, 20))
plt.imshow(np.clip(eopatch.data['MY-S2-L2A-BANDS'][0][..., [2, 1, 0]] * 3.5, 0, 1))
plt.xticks([])
plt.yticks([])
ax.set_aspect('auto')

del eopatch

Plot mean NDVI

Plot the time-wise mean of NDVI for the whole region. Filter out clouds in the mean calculation.

In [15]:
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)

fig, ax = plt.subplots(figsize=(20, 20))
ndvi = eopatch.data['NDVI']
mask = eopatch.mask['VALID_DATA']
ndvi[~mask] = np.nan
ndvi_mean = np.nanmean(ndvi, axis=0).squeeze()
im = ax.imshow(ndvi_mean, vmin=0, vmax=0.8, cmap=plt.get_cmap('YlGn'))
ax.set_xticks([])
ax.set_yticks([])
ax.set_aspect('auto')

cb = fig.colorbar(im, ax=ax, orientation='horizontal', pad=0.01, aspect=100)
cb.ax.tick_params(labelsize=20)
plt.show()

del eopatch

Plot all masks

To see how the valid data masks look like and work together, you can compare them to a regular RGB image. For demonstration reasons a timeframe is selected which contains cloud-covered area.

In [16]:
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)

tidx = 1
plt.figure(figsize=(20,20))

plt.subplot(331)
plt.imshow(np.clip(eopatch.data['MY-S2-L2A-BANDS'][tidx][..., [2,1,0]] * 3.5,0,1))
plt.xticks([])
plt.yticks([])
plt.title('MY-S2-L2A-BANDS - RGB')

plt.subplot(332)
plt.imshow(eopatch.mask['IS_DATA'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('IS_DATA - Data availability')

plt.subplot(333)
plt.imshow(eopatch.mask['CLM'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('CLM - Cloudmask')

plt.subplot(334)
plt.imshow(eopatch.mask['L1C_VALID'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('L1C_VALID - L1C valid data mask')

plt.subplot(335)
plt.imshow(eopatch.mask['L2A_VALID'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('L2A_VALID - L2A valid data mask')

plt.subplot(336)
plt.imshow(eopatch.mask['SCL'][tidx].squeeze(), cmap='jet')
plt.xticks([])
plt.yticks([])
plt.title('SCL - Sen2Cor scene classification map')

plt.subplot(338)
plt.imshow(eopatch.mask['VALID_DATA'][tidx].squeeze(), vmin=0, vmax=1, cmap='gray')
plt.xticks([])
plt.yticks([])
plt.title('VALID_DATA - Combined valid data mask')

del eopatch

As you can see invalid pixel from the different cloud masks and Sen2Cor scene classification map are combined. For SCL the classes:

  • 1 SC_SATURATED_DEFECTIVE
  • 3 SC_CLOUD_SHADOW
  • 8 SC_CLOUD_MEDIUM_PROBABILITY
  • 9 CLOUD_HIGH_PROBABILITY
  • 10 THIN_CIRRUS
  • 11 SNOW

are considered as invalid.

Plot spatial mean NDVI timeseries

Plot the mean of NDVI over all pixels in a single patch throughout the year. Filter out clouds in the mean calculation.

In [17]:
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)

ndvi_series = eopatch.data['NDVI']
time = np.array(eopatch.timestamp)
mask = eopatch.mask['VALID_DATA']
t, w, h, _ = ndvi_series.shape

ndvi_clean = ndvi_series.copy()
ndvi_clean[~mask] = np.nan # set values of invalid pixels to NaN's

# Calculate means, remove NaN's from means
ndvi_mean = np.nanmean(ndvi_series.reshape(t, w * h).squeeze(), axis=1)
ndvi_mean_clean = np.nanmean(ndvi_clean.reshape(t, w * h).squeeze(), axis=1)
time_clean = time[~np.isnan(ndvi_mean_clean)]
ndvi_mean_clean = ndvi_mean_clean[~np.isnan(ndvi_mean_clean)]

fig, ax = plt.subplots(figsize=(20, 5))
plt.plot(time_clean, ndvi_mean_clean, 's-', label = 'Mean NDVI with cloud cleaning')
plt.plot(time, ndvi_mean, 'o-', label='Mean NDVI without cloud cleaning')
plt.xlabel('Time', fontsize=15)
plt.ylabel('Mean NDVI over patch', fontsize=15)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(loc=2, prop={'size': 15});
ax.set_aspect('auto')

del eopatch
c:\programdata\anaconda3\lib\site-packages\pandas\plotting\_matplotlib\converter.py:103: FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import. Future versions of pandas will require you to explicitly register matplotlib converters.

To register the converters:
	>>> from pandas.plotting import register_matplotlib_converters
	>>> register_matplotlib_converters()
  warnings.warn(msg, FutureWarning)

The time series displayed looks very fragmented for the temporal resolution of the Sentinel 2 data to be so hyped, right?

This is what you get if you choose to keep timeframes with valid data fraction over 70% only. You set the value in your "datafrac" configuration. If you expect a nice overview of vegetation growing stages, reality kicks in and gives you mostly cloudy conditions in the first months of the year. The good thing about being picky about the validity of your timeframes is reduced data volume. Invalid frames contain no additional value for your later analysis anyways.

Resampling - Interpolation - LPIS data preparation - Sampling

Feature concatenation and interpolation

  • For easier handling of the data you concatenate MY-S2-L2A-BANDS, NDVI, NDWI, NORM info into a single feature called FEATURES
  • Perform temporal interpolation (filling gaps and resampling to the same dates) by:
    • creating a linear interpolation task in the temporal dimension
    • providing the cloud mask to tell the interpolating function which values to update
    • using only timeframes from a timerange all EOPatches have in common (from earliest date to latest date)

LPIS data preparation

  • From scratch, LPIS data is divided into 200 different crop type classes. As the classification is based on spectral signatures, those have to be distinctive. 200 classes are obviously too detailed for achieving accurate prediction results. Therefore you group these classes into reasonable groups also based on similar spectral characteristics using the two CSV files from the "General data" folder. The basic grouping defines 14 groups namely: Grass, Maize, Orchards, Peas, Potatoes, Pumpkins, Soybean, Summer cereals, Sunflower, Vegetables, Vine-yards, Winter cereals, Winter rape, Other. This grouping turned out to perform best in classification.
  • After the grouping, the data set stored in vector format is converted into a raster format. Thus, each EO pixel can be assigned to a crop type value. All polygons belonging to one of the classes are separately burned to the raster mask.
  • In order to get rid of artifacts with a width of 1 pixel, and mixed pixels at the edges between polygons of different classes you perform an erosion. That means a buffer of 1 pixel (10m) size is applied to each individual field in the border area.

Sampling

By a spatial sampling of the EOPatches you randomly take a subset of pixels from a patch to use in the machine learning training and testing. Here you only want to consider classes that are represented to a certain quantity of pixels.

  • Remember your "pixel_tres" configuration - a threshold of 1000 pixel is necessary for a class to be considered in sampling
  • Remember your "samp_class" configuration - 500 pixel per class per EOPatch are sampled
In [18]:
# for linear interpolation find earliest and latest overlapping dates

# list EOPatches
eopatches = []

patch_list = os.listdir(patch_path)

for i in patch_list:
    eopatches.append(EOPatch.load(os.path.join(patch_path, i), lazy_loading=True))

eopatches = np.array(eopatches)

# identify  earliest date
timelist = []
for eopatch in eopatches:
    timelist.append(eopatch.timestamp[0])
mindate = str(max(timelist).date())
print('Earliest date: ' + str(max(timelist)))

# identify  latest date
timelist = []
for eopatch in eopatches:
    timelist.append(eopatch.timestamp[-1])
maxdate = str(min(timelist).date())
print('Latest date: ' + str(min(timelist)))
Earliest date: 2018-01-06 10:04:51
Latest date: 2018-09-28 10:00:24

Set up your 3. EOWorkflow - Feature engineering/ Crop type grouping/ Sampling

In [19]:
# TASK FOR LOADING EXISTING EOPATCHES
load = LoadTask(patch_path)

# TASK FOR CONCATENATION
# bands and indices are concatenated into one features dictionary
concatenate = ConcatenateData('FEATURES', ['MY-S2-L2A-BANDS','NDVI','NDWI','NORM'])

# TASK FOR LINEAR INTERPOLATION
# linear interpolation of full time-series and date resampling
resample_range = (mindate, maxdate, day_range)
linear_interp = LinearInterpolation(
    'FEATURES', # name of field to interpolate
    mask_feature=(FeatureType.MASK, 'VALID_DATA'), # mask to be used in interpolation
    copy_features=[(FeatureType.VECTOR_TIMELESS, 'LPIS_{}'.format(year))], # features to keep
    resample_range=resample_range, # set the resampling range
    bounds_error=False # extrapolate with NaN's
)

# TASK TO FIX AUSTRIAN LPIS DATA
# on the basis of the wrongly defined column "SNAR_BEZEI" 
# a column "SNAR_BEZEI_NAME" is added which defines the LPIS class
fixlpis = FixLPIS(feature='LPIS_{}'.format(year), country='Austria')

# TASK FOR GROUPING LPIS INTO WANTED CLASSES
# on the basis of the two grouping files an individual crop type grouping can be applied
# for changes these files have to be adapted
grouplpis = GroupLPIS(year=year, lpis_to_group_file=lpis_to_group_file, crop_group_file=crop_group_file)

# TASK FOR CONVERTING LPIS DATA FROM VECTOR TO RASTER FORMAT
# multiple rasterized layers appling different crop type groupings can be stored in an EOPatch
vtr = VectorToRaster(
    vector_input=(FeatureType.VECTOR_TIMELESS, 'LPIS_{}'.format(year)), 
    raster_feature=(FeatureType.MASK_TIMELESS, 'LPIS_class_{}'.format(grouping_id)),
    values_column='GROUP_1_ID',
    raster_shape=(FeatureType.DATA, 'FEATURES'),
    no_data_value=0)

# TASK FOR EROSION
# erode each class of the reference map
erosion = ErosionTask(mask_feature=(FeatureType.MASK_TIMELESS,'LPIS_class_{}'.format(grouping_id),
                                    'LPIS_class_{}_ERODED'.format(grouping_id)), disk_radius=1)

# TASK FOR SPATIAL SAMPLING
spatial_sampling = SamplingTaskTask(grouping_id, pixel_thres, samp_class)

# TASK FOR SAVING TO OUTPUT
save = SaveTask(patch_path, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)

# define the workflow
workflow = LinearWorkflow(load, 
                          concatenate, 
                          linear_interp, 
                          fixlpis, 
                          grouplpis, 
                          vtr, 
                          erosion, 
                          spatial_sampling,
                          save)

Run third EOWorkflow

In [21]:
pbar = tqdm(total=len(patch_list))
for patch_name in patch_list:
    extra_param = {load: {'eopatch_folder': patch_name},
                   grouplpis: {'col_cropN_lpis': 'SNAR_BEZEI_NAME',
                              'col_cropN_lpistogroup': 'CROP_ID'},
                   save: {'eopatch_folder': patch_name}}

    workflow.execute(extra_param)
    pbar.update(1)

EOPatch data visualization

Now, after all the data is transformed and sampled let's load the single EOPatch again and look at the structure. By executing

EOPatch.load(os.path.join(patch_path, 'eopatch_0_col-0_row-0')

You obtain the following structure:

EOPatch(
  data: {
    FEATURES: numpy.ndarray(shape=(31, 1033, 1040, 13), dtype=float64)
    FEATURES_SAMPLED: numpy.ndarray(shape=(31, 6000, 1, 13), dtype=float64)
  }
  mask: {}
  scalar: {}
  label: {}
  vector: {}
  data_timeless: {}
  mask_timeless: {
    LPIS_class_basic: numpy.ndarray(shape=(1033, 1040, 1), dtype=uint8)
    LPIS_class_basic_ERODED: numpy.ndarray(shape=(1033, 1040, 1), dtype=uint8)
    LPIS_class_basic_ERODED_SAMPLED: numpy.ndarray(shape=(6000, 1, 1), dtype=uint8)
  }
  scalar_timeless: {}
  label_timeless: {}
  vector_timeless: {
    LPIS_2018: geopandas.GeoDataFrame(columns=['geometry', 'FS_KENNUNG', 'SL_FLAECHE', 'ID', 'SNAR_BEZEI', 'DateImported', 'SNAR_BEZEI_NAME', 'CROP_ID', 'english', 'slovenian', 'latin', 'GROUP_1', 'GROUP_1_original', 'GROUP_1_ID'], length=4140, crs=epsg:32633)
  }
  meta_info: {}
  bbox: BBox(((420717.14926283853, 5329441.919254168), (431121.7036578405, 5339770.083848184)), crs=EPSG:32633)
  timestamp: [datetime.datetime(2018, 1, 29, 0, 0), ..., datetime.datetime(2018, 9, 26, 0, 0)], length=31
)

Things have changed, haven't they?

Your 10 spectral bands and 3 indices are combined in FEATURES and the randomly sampled pixels are stored in FEATURES_SAMPLED. After filtering, your valid data masks have been deleted and your eroded and sampled reference data is available in practical raster format as mask_timeless.FeatureType.

Combine samples and split into train and test data

As you performed the spatial sampling for each patch separately you have to combine the samples. But first you have to assign your EOPatches either to the training or validation dataset. In this case you take one in four EOPatches for testing.

Only classes present in both train and test dataset are considered in the classification.

The sampled features and labels are loaded and reshaped into $n \times m$, where $n$ represents the number of training pixels, and $m = f \times t$ the number of all features, with $f$ the size of bands and band combinations (in this example 13) and $t$ the length of the resampled time-series (in this example 34)

Terminology: In data science features are commonly refered to as "X" and labels as "y"

In [22]:
patch_list = os.listdir(patch_path) # update patch list

# combine EOPatches to one dataset
eopatches = []
for i in patch_list:
    eopatches.append(EOPatch.load(os.path.join(patch_path, i), lazy_loading=True))

eopatches = np.array(eopatches)

# depending on the number of EOPatches adjust test_ratio if necessary and split into test and train data accordingly
if len(patch_list) == 1:
    # split combined dataset into train and test data
    X_train, X_test, y_train, y_test, n_timesteps, n_features = train_test_split_eopatch(eopatches, 
                                                                features_dict, 
                                                                labels_dict)

elif len(patch_list) < 4:
    test_ratio = 3
    # split combined dataset into train and test data
    X_train, X_test, y_train, y_test, n_timesteps, n_features = train_test_split_eopatches(eopatches, 
                                                                  test_ratio, 
                                                                  features_dict, 
                                                                  labels_dict)

else:
    # split combined dataset into train and test data
    X_train, X_test, y_train, y_test, n_timesteps, n_features = train_test_split_eopatches(eopatches, 
                                                                  test_ratio, 
                                                                  features_dict, 
                                                                  labels_dict)

# mask out labels that are not in both train and test data and also mask out samples where features include NaN values
X_train, X_test, y_train, y_test = masking(X_train, X_test, y_train, y_test)

total_samp_count = X_train.shape[0] + X_test.shape[0]
print('From your {} EOPatch(es) at total of {} samples were taken. '
      'This sampling dataset includes {} training and {} test samples.'.format(len(patch_list), 
                                                                               total_samp_count, 
                                                                               X_train.shape[0], 
                                                                               X_test.shape[0]))
From your 4 EOPatch(es) at total of 22070 samples were taken. This sampling dataset includes 16268 training and 5802 test samples.

Plot sample distribution

In [23]:
fig = plt.figure(figsize=(20, 15))

y_ids_train, y_counts_train = np.unique(y_train, return_counts=True)

plt.subplot(2, 1, 1)
plt.bar(range(len(y_ids_train)), y_counts_train)
plt.xticks(range(len(y_ids_train)), [class_names[i] for i in y_ids_train], rotation=90, fontsize=20);
plt.yticks(fontsize=20)
plt.grid(True)
plt.title('Training samples', size=20)

y_ids_test, y_counts_test = np.unique(y_test, return_counts=True)

plt.subplot(2, 1, 2)
plt.bar(range(len(y_ids_test)), y_counts_test)
plt.xticks(range(len(y_ids_test)), [class_names[i] for i in y_ids_test], rotation=90, fontsize=20);
plt.yticks(fontsize=20)
plt.grid(True)
plt.title('Test samples', size=20)

fig.subplots_adjust(wspace=0, hspace=1)

As you can see you have managed to generate a well balanced dataset. In both your 3/4 training and 1/4 test dataset no group is under or over represented, which provides a reasonable basis for the following classification.

Scaling and one-hot-encoding

In the following you want to feed your samples into two different algorithms. To guarantee equivalent conditions for both models you need scaled features and one-hot-endcoded labels.

In [24]:
# scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train = np.reshape(X_train, (-1,n_timesteps,n_features))
X_test = np.reshape(X_test, (-1,n_timesteps,n_features))
# save feature related scaling properties
joblib.dump(scaler, os.path.join(samples_path, 
                                 'Scaler_{}.bin'.format(grouping_id)), 
                                 compress=True)

# labels one- hot-encoding
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)
enc = OneHotEncoder(sparse=False)
enc.fit(np.array(class_ids).reshape(-1, 1))
y_train = enc.transform(y_train)
y_test = enc.transform(y_test)
label_count = y_train.shape[1]

Save or load samples (optional)

You can choose to save your samples for later applications. For entering the upcoming part of prediction, this is not necessary.

In [25]:
# save samples(optional)
np.save(os.path.join(samples_path, 'X_train_{}'.format(grouping_id)), X_train)

np.save(os.path.join(samples_path, 'X_test_{}'.format(grouping_id)), X_test)

np.save(os.path.join(samples_path, 'y_train_{}'.format(grouping_id)), y_train)

np.save(os.path.join(samples_path, 'y_test_{}'.format(grouping_id)), y_test)
In [26]:
# load samples(optional)
X_train = np.load(os.path.join(samples_path, 'X_train_{}.npy'.format(grouping_id)))

X_test = np.load(os.path.join(samples_path, 'X_test_{}.npy'.format(grouping_id)))

y_train = np.load(os.path.join(samples_path, 'y_train_{}.npy'.format(grouping_id)))

y_test = np.load(os.path.join(samples_path, 'y_test_{}.npy'.format(grouping_id)))

6. Prediction

Congrats, you've mastered the heavy preprocessing steps! Now, this is where the magic of Machine and Deep Learning happens.

State-of-the-art LightGBM is used as a ML model. It is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithms, used for many ML tasks.

As novel competitors, TempCNN DL architectures are entering the game. So far Convolutional Neural Networks were mainly and successfully applied for image and language recognition tasks. Modifying the convolutional filters of the architectures the Temporal CNN is supposed to exploit the temporal information of satellite image time series.

Set up and train LightGBM model

The default hyper-parameters are used in this example. For more info on parameter tuning, check the documentation of the package.

In [27]:
%%time

# Set up training classes
rev_y_train = [np.argmax(y, axis=None, out=None) for y in y_train]
rev_y_train_unique = np.unique(rev_y_train)

# reshape features from count-timeframes-features to timeframes-count-features
a, b, c = X_train.shape
X_train_lgbm = X_train.reshape(a,b * c)


# Set up the LightGBM model
model_lgbm = lgb.LGBMClassifier(
    objective='multiclass', 
    num_class=len(rev_y_train_unique), 
    metric='multi_logloss'
)

# Train the model
model_lgbm.fit(X_train_lgbm, rev_y_train)

# Save the model
joblib.dump(model_lgbm, os.path.join(models_path, 'model_lgbm_CropTypeClass_{}.pkl'.format(grouping_id)))
Wall time: 1min 2s
Out[27]:
['C:\\Users\\willing\\AT_EOPatches\\CropTypeClassWorkflow_basicFolder_final_Frank\\Output\\Models\\model_lgbm_CropTypeClass_basic.pkl']

Set up and train TempCNN model

In this example an approved architecture from the scientific paper linked above is adopted.

In [28]:
%%time

# Set up the TempCNN architecture
model_tcnn = Sequential()
model_tcnn.add(Conv1D(filters=5, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model_tcnn.add(Dropout(0.5))
model_tcnn.add(Conv1D(filters=5, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model_tcnn.add(Dropout(0.5))
model_tcnn.add(Conv1D(filters=5, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model_tcnn.add(Dropout(0.5))
model_tcnn.add(Flatten())
model_tcnn.add(Dense(256, activation='relu'))
model_tcnn.add(Dense(label_count, activation='softmax'))
model_tcnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model_tcnn.fit(X_train, 
          y_train, 
          validation_data=(X_test, y_test), 
          epochs=10, 
          batch_size=32, 
          verbose=1,  
          shuffle=True)

# Save the model
joblib.dump(model_tcnn, os.path.join(models_path, 'model_tcnn_CropTypeClass_{}.pkl'.format(grouping_id)))
WARNING:tensorflow:From C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\tensor_util.py:125: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  tensor_proto.float_val.extend([np.asscalar(x) for x in proto_values])
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\tensor_util.py:131: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  tensor_proto.int_val.extend([np.asscalar(x) for x in proto_values])
WARNING:tensorflow:From C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\keras\backend\tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\tensor_util.py:157: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  tensor_proto.bool_val.extend([np.asscalar(x) for x in proto_values])
WARNING:tensorflow:From C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
C:\Users\willing\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\tensor_util.py:134: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  tensor_proto.int64_val.extend([np.asscalar(x) for x in proto_values])
Train on 16268 samples, validate on 5802 samples
Epoch 1/10
16268/16268 [==============================] - 4s 264us/step - loss: 1.6354 - acc: 0.4436 - val_loss: 0.9912 - val_acc: 0.6891
Epoch 2/10
16268/16268 [==============================] - 2s 141us/step - loss: 1.1542 - acc: 0.6087 - val_loss: 0.8709 - val_acc: 0.7504
Epoch 3/10
16268/16268 [==============================] - 2s 141us/step - loss: 1.0235 - acc: 0.6489 - val_loss: 0.8830 - val_acc: 0.7187
Epoch 4/10
16268/16268 [==============================] - 2s 143us/step - loss: 0.9558 - acc: 0.6737 - val_loss: 0.8499 - val_acc: 0.7358
Epoch 5/10
16268/16268 [==============================] - 2s 142us/step - loss: 0.8967 - acc: 0.6952 - val_loss: 0.8241 - val_acc: 0.7477
Epoch 6/10
16268/16268 [==============================] - 2s 140us/step - loss: 0.8723 - acc: 0.7024 - val_loss: 0.8066 - val_acc: 0.7527
Epoch 7/10
16268/16268 [==============================] - 2s 144us/step - loss: 0.8256 - acc: 0.7211 - val_loss: 0.7795 - val_acc: 0.7635
Epoch 8/10
16268/16268 [==============================] - 2s 142us/step - loss: 0.8146 - acc: 0.7231 - val_loss: 0.8292 - val_acc: 0.7468
Epoch 9/10
16268/16268 [==============================] - 2s 142us/step - loss: 0.7878 - acc: 0.7345 - val_loss: 0.8329 - val_acc: 0.7670
Epoch 10/10
16268/16268 [==============================] - 2s 145us/step - loss: 0.7690 - acc: 0.7431 - val_loss: 0.7905 - val_acc: 0.7684
Wall time: 27.4 s
Out[28]:
['C:\\Users\\willing\\AT_EOPatches\\CropTypeClassWorkflow_basicFolder_final_Frank\\Output\\Models\\model_tcnn_CropTypeClass_basic.pkl']

Validation and evaluation

Validation of the models is a crucial step in data science. All models are wrong, but some are less wrong than others, so model evaluation is important.

In order to validate the models, we use the training set to predict the classes, and then compare the predicted set of labels to the "ground truth".

The validation is performed by evaluating various metrics, such as accuracy, precision, recall, $F_1$ score, some of which are nicely described in this blog post

Get the overall accuracy (OA) and the weighted $F_1$ score

In [29]:
# reshape features from count-timeframes-features to timeframes-count-features
# and set up training classes
d, e, f = X_test.shape
X_test_lgbm = X_test.reshape(d, e * f)
rev_y_test = [np.argmax(y, axis=None, out=None) for y in y_test]

# Load the models
model_lgbm = joblib.load(os.path.join(models_path, 'model_lgbm_CropTypeClass_{}.pkl'.format(grouping_id)))
model_tcnn = joblib.load(os.path.join(models_path, 'model_tcnn_CropTypeClass_{}.pkl'.format(grouping_id)))

# get overall accuracy and weighted F1-score for LightGBM
py_test_lgbm = model_lgbm.predict(X_test_lgbm)
print('Classification accuracy LightGBM {:.1f}%'.format(100 * metrics.accuracy_score(rev_y_test, py_test_lgbm)))
print('Classification F1-score LightGBM {:.1f}%'.format(100 * metrics.f1_score(rev_y_test, py_test_lgbm, average='weighted')))

py_test_tcnn = model_tcnn.predict_classes(X_test)
print('Classification accuracy TempCNN {:.1f}%'.format(100 * metrics.accuracy_score(rev_y_test, py_test_tcnn)))
print('Classification F1-score TempCNN {:.1f}%'.format(100 * metrics.f1_score(rev_y_test, py_test_tcnn, average='weighted')))
Classification accuracy LightGBM 82.7%
Classification F1-score LightGBM 82.6%
Classification accuracy TempCNN 76.8%
Classification F1-score TempCNN 75.9%

$F_1$ score, precision, and recall for each class separately

In [30]:
# LightGBM: F1-score, precision, and recall for each class separately
class_labels = np.unique(rev_y_test)
class_names = [entry.class_name for entry in LPISCLASS]

f1_scores = metrics.f1_score(rev_y_test, py_test_lgbm, labels=class_labels, average=None)
recall = metrics.recall_score(rev_y_test, py_test_lgbm, labels=class_labels, average=None)
precision = metrics.precision_score(rev_y_test, py_test_lgbm, labels=class_labels, average=None) 

print('LightGBM:')
print('             Class              =  F1    | Recall   | Precision')
print('         --------------------------------------------------')
for idx, croptype in enumerate([class_names[idx] for idx in class_labels]):
    print('         * {0:20s} = {1:2.1f}    |  {2:2.1f}     | {3:2.1f}'.format(croptype, 
                                                                         f1_scores[idx] * 100, 
                                                                         recall[idx] * 100, 
                                                                         precision[idx] * 100))

# TempCNN: F1-score, precision, and recall for each class separately
class_names = [entry.class_name for entry in LPISCLASS]

f1_scores = metrics.f1_score(rev_y_test, py_test_tcnn, labels=class_labels, average=None)
recall = metrics.recall_score(rev_y_test, py_test_tcnn, labels=class_labels, average=None)
precision = metrics.precision_score(rev_y_test, py_test_tcnn, labels=class_labels, average=None) 

print('TempCNN:')
print('             Class              =  F1    | Recall   | Precision')
print('         --------------------------------------------------')
for idx, croptype in enumerate([class_names[idx] for idx in class_labels]):
    print('         * {0:20s} = {1:2.1f}    |  {2:2.1f}     | {3:2.1f}'.format(croptype, 
                                                                         f1_scores[idx] * 100, 
                                                                         recall[idx] * 100, 
                                                                         precision[idx] * 100))
LightGBM:
             Class              =  F1    | Recall   | Precision
         --------------------------------------------------
         * Peas                 = 86.0    |  95.1     | 78.5
         * Grass                = 72.7    |  90.8     | 60.7
         * Winter rape          = 98.8    |  97.7     | 100.0
         * Maize                = 94.8    |  95.5     | 94.2
         * Winter cereals       = 96.9    |  96.5     | 97.3
         * Pumpkins             = 78.0    |  67.0     | 93.3
         * Summer cereals       = 83.4    |  76.3     | 92.0
         * Vegetables           = 90.9    |  92.8     | 89.1
         * Potatoes             = 66.0    |  53.4     | 86.4
         * Other                = 58.3    |  59.5     | 57.2
         * Soybean              = 81.7    |  95.0     | 71.7
         * Orchards             = 81.9    |  73.0     | 93.4
TempCNN:
             Class              =  F1    | Recall   | Precision
         --------------------------------------------------
         * Peas                 = 74.5    |  91.2     | 62.9
         * Grass                = 67.3    |  79.8     | 58.1
         * Winter rape          = 97.7    |  96.4     | 98.9
         * Maize                = 90.2    |  91.4     | 88.9
         * Winter cereals       = 90.5    |  95.2     | 86.3
         * Pumpkins             = 69.9    |  60.6     | 82.7
         * Summer cereals       = 77.4    |  73.6     | 81.6
         * Vegetables           = 83.6    |  86.8     | 80.7
         * Potatoes             = 43.0    |  34.8     | 56.1
         * Other                = 47.3    |  39.4     | 59.0
         * Soybean              = 89.1    |  95.0     | 83.9
         * Orchards             = 78.9    |  76.8     | 81.2

Plot the standard Confusion Matrix for LightGBM

In [31]:
fig = plt.figure(figsize=(20, 20))
conf_matrix_gbm = metrics.confusion_matrix(rev_y_test, py_test_lgbm)
plot_confusion_matrix(conf_matrix_gbm,
                      classes=[name for idx, name in enumerate(class_names) if idx in class_labels],
                      normalize=True,
                      ylabel='Truth (CROPS)',
                      xlabel='Predicted (LightGBM)',
                      title='Confusion matrix');

Plot the standard Confusion Matrix for TempCNN

In [32]:
fig = plt.figure(figsize=(20, 20))
conf_matrix_gbm = metrics.confusion_matrix(rev_y_test, py_test_tcnn)
plot_confusion_matrix(conf_matrix_gbm,
                      classes=[name for idx, name in enumerate(class_names) if idx in class_labels],
                      normalize=True,
                      ylabel='Truth (CROPS)',
                      xlabel='Predicted (TempCNN)',
                      title='Confusion matrix');

The validation of the models shows that for most of the groups both perform very well. However, there seem to be differences in their confusion for certain classes:

  • In this specific case orchards might catch your attention mostly. LightGBM performs worse than TempCNN. But more interesting than the overall accuracy is, that LightGBM classifies actual orchards as grass a lot (low recall) while, no other class is mistaken as orchards (high precision). In contrast TempCNN recognizes actual orchards well (high recall) but identifies acutal grass as orchards frequently (lower precision). Generally, confusion with grass class is not surprising, as there is a lot of it between the individual trees.
  • There is also poor performance received for potatoes in both models as their cultivation practices are quite similar to peas.
  • Poor performance for the group Other is expectable in consequence of its diverse class composition

Most important features

The LightGBM model contains the information about feature importances. Let's check which features are most important for classification.

In [33]:
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)

timeframe_count = eopatch.data['FEATURES'].shape[0]
features_count = eopatch.data['FEATURES'].shape[3]

del eopatch

z = model_lgbm.feature_importances_.reshape((timeframe_count, features_count))

fnames = ['B02','B03','B04','B05','B06','B07','B08','B8A','B11','B12','NDVI','NDWI','NORM']

fig = plt.figure(figsize=(15, 15))
ax = plt.gca()

# plot the importances
im = ax.imshow(z, aspect=0.25)
plt.xticks(range(len(fnames)), fnames, rotation=45, fontsize=20)
plt.yticks(range(timeframe_count), ['T{}'.format(i) for i in range(timeframe_count)], fontsize=20)
plt.xlabel('Bands and band related features', fontsize=20)
plt.ylabel('Time frames', fontsize=15)
plt.ylim(top=-0.5, bottom=timeframe_count - 0.5)
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')


fig.subplots_adjust(wspace=0, hspace=0)

cb = fig.colorbar(im, ax=[ax], orientation='horizontal', pad=0.01, aspect=100)
cb.ax.tick_params(labelsize=20)
cb.set_label('Feature importance', fontsize=15)

As you can see, the most important features for LightGBM are recorded within the main growth period. Here different growing stages can be detected that constitute certain crop types.

Prediction

Now that both models have been validated, the remaining thing is to predict the whole AOI. As LightGBM receives higher overall accurays it is used for further predictions. If you are interested in a specific crop group TempCNN is outperforming LightGBM simply change the following configuration.

In [34]:
# swap commentation for using a different model

# model = joblib.load(os.path.join(models_path, 'model_tcnn_CropTypeClass_{}.pkl'.format(grouping_id))) # load TempCNN model
model = joblib.load(os.path.join(models_path, 'model_lgbm_CropTypeClass_{}.pkl'.format(grouping_id))) # load LightGBM model

# load respective feature scaler
scaler = joblib.load(os.path.join(samples_path, 'Scaler_{}.bin'.format(grouping_id)))

In the following you define a workflow to make a prediction on the existing EOPatches. The EOTask accepts the features and the names for the labels. In addition you export GeoTIFF images of the prediction to easily access your visual results.

Set up your 4. EOWorklow - Prediction

In [35]:
# TASK TO LOAD EXISTING EOPATCHES
load = LoadTask(patch_path)

# TASK FOR PREDICTION
predict = PredictPatch(model, (FeatureType.DATA, 'FEATURES'), 'LBL_GBM', scaler)

# TASK TO EXPORT TIFF
export_tiff = ExportToTiff((FeatureType.MASK_TIMELESS, 'LBL_GBM'))
tiff_location = predictions_path
if not os.path.isdir(tiff_location):
    os.makedirs(tiff_location)
    
# TASK FOR SAVING
save = SaveTask(patch_path, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)


workflow = LinearWorkflow(load,
                          predict,
                          export_tiff,
                          save)

Run fourth EOWorkflow

In [36]:
patch_list = os.listdir(patch_path) # update patch list
# execute workflow
pbar = tqdm(total=len(patch_list))
for patch_name in patch_list:
    extra_param = {load: {'eopatch_folder': patch_name}, 
                   export_tiff: {'filename': '{}/prediction_{}.tiff'.format(predictions_path, patch_name)}, 
                   save: {'eopatch_folder': patch_name}}

    workflow.execute(extra_param)
    pbar.update()

EOPatch data visualization

Finishing the last processing step, let's have a look at the final EOPatch by executing

EOPatch.load(os.path.join(patch_path, 'eopatch_0_col-0_row-0')

You obtain the following structure which is extended by your predicted data stored as LBL_GBM in mask_timeless.FeatureType:

EOPatch(
  data: {
    FEATURES: numpy.ndarray(shape=(34, 1028, 1033, 13), dtype=float64)
    FEATURES_SAMPLED: numpy.ndarray(shape=(34, 6000, 1, 13), dtype=float64)
  }
  mask: {}
  scalar: {}
  label: {}
  vector: {}
  data_timeless: {}
  mask_timeless: {
    LBL_GBM: numpy.ndarray(shape=(1028, 1033, 1), dtype=int64)
    LPIS_class_basic: numpy.ndarray(shape=(1028, 1033, 1), dtype=uint8)
    LPIS_class_basic_ERODED: numpy.ndarray(shape=(1028, 1033, 1), dtype=uint8)
    LPIS_class_basic_ERODED_SAMPLED: numpy.ndarray(shape=(6000, 1, 1), dtype=uint8)
  }
  scalar_timeless: {}
  label_timeless: {}
  vector_timeless: {
    LPIS_2018: geopandas.GeoDataFrame(columns=['geometry', 'FS_KENNUNG', 'SL_FLAECHE', 'ID', 'SNAR_BEZEI', 'DateImported', 'SNAR_BEZEI_NAME', 'CROP_ID', 'english', 'slovenian', 'latin', 'GROUP_1', 'GROUP_1_original', 'GROUP_1_ID'], length=4091, crs=EPSG:32633)
  }
  meta_info: {}
  bbox: BBox(((420862.3179607267, 5329537.336315366), (431194.28800678457, 5339817.792378783)), crs=CRS('32633'))
  timestamp: [datetime.datetime(2018, 1, 6, 0, 0), ..., datetime.datetime(2018, 9, 27, 0, 0)], length=34
)

Visualization of the results

Visualize predicted EOPatch data

In [37]:
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)

# update colormap
cb_classes = np.unique(np.unique(eopatch.mask_timeless['LBL_GBM']))
custom_cmap = mpl.colors.ListedColormap([lpisclass_cmap.colors[i] for i in cb_classes])
custom_norm = mpl.colors.BoundaryNorm(np.arange(-0.5, len(cb_classes), 1), custom_cmap.N)

# mask prediction - exclude pixel with no LPIS reference
labels = np.array(eopatch.mask_timeless['LPIS_class_{}'.format(grouping_id)])
mask = labels == 0
labelspred = np.array(eopatch.mask_timeless['LBL_GBM'])
LBL = np.ma.masked_array(labelspred, mask)

# plot figure
fig, ax = plt.subplots(figsize=(20, 20))
im = ax.imshow(LBL.squeeze(), cmap=lpisclass_cmap, norm=lpisclass_norm)
ax.set_xticks([])
ax.set_yticks([])
ax.set_aspect('auto')

fig.subplots_adjust(wspace=0, hspace=0)

# plot colorbar
cb = fig.colorbar(mpl.cm.ScalarMappable(norm=custom_norm, cmap=custom_cmap), 
                  orientation="horizontal", 
                  pad=0.01, 
                  aspect=100)
cb.ax.tick_params(labelsize=20)
cb.set_ticks(range(len(cb_classes)))
cb.ax.set_xticklabels([class_names[i] for i in cb_classes], rotation=90, fontsize=15)
plt.show()

del eopatch

Compare ground truth and prediction

In [38]:
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)

# mask prediction - exclude pixel with no LPIS reference
labels = np.array(eopatch.mask_timeless['LPIS_class_{}'.format(grouping_id)])
mask = labels == 0
labelspred = np.array(eopatch.mask_timeless['LBL_GBM'])
LBL = np.ma.masked_array(labelspred, mask)

fig, axes = plt.subplots(2,2,figsize=(20, 10))

# plot prediction
ax1 = plt.subplot(121)
im = ax1.imshow(LBL.squeeze(), cmap=lpisclass_cmap, norm=lpisclass_norm)
plt.title('Prediction')
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_aspect('auto')

# plot ground truth
ax2 = plt.subplot(122)
im = ax2.imshow(labels.squeeze(), cmap=lpisclass_cmap, norm=lpisclass_norm)
plt.title('Ground truth')
ax2.set_xticks([])
ax2.set_yticks([])
ax2.set_aspect('auto')

axlist=[ax1,ax2]

fig.subplots_adjust(wspace=0, hspace=0)

# plot colorbar
cb = fig.colorbar(mpl.cm.ScalarMappable(norm=custom_norm, cmap=custom_cmap), 
                  ax = axlist,
                  orientation="horizontal", 
                  pad=0.01, 
                  aspect=100)
cb.ax.tick_params(labelsize=20)
cb.set_ticks(range(len(cb_classes)))
cb.ax.set_xticklabels([class_names[i] for i in cb_classes], rotation=90, fontsize=15)
plt.show()

del eopatch

Close-up comparison

In [39]:
# Draw the Reference map
eopatch_name = 'eopatch_0_col-0_row-0' # get the name of the first newly created EOPatch
eopatch = EOPatch.load(os.path.join(patch_path, eopatch_name), lazy_loading=True)

# create red-green colormap
colors = [(0, 1, 0), (1, 0, 0)]  # G -> R
cmap_name = 'my_list'
cm = LinearSegmentedColormap.from_list(
        cmap_name, colors)


fig = plt.figure(figsize=(20, 20))

inspect_size = 100

w, h = labels.squeeze().shape

w_min = np.random.choice(range(w - inspect_size))
h_min = np.random.choice(range(h - inspect_size))

ax = plt.subplot(2, 2, 1)
plt.imshow(labels.squeeze()[w_min: w_min + inspect_size, h_min : h_min + inspect_size],
           cmap=lpisclass_cmap, norm=lpisclass_norm)
plt.xticks([])
plt.yticks([])
ax.set_aspect('auto')
plt.title('Ground truth', fontsize=20)

ax = plt.subplot(2, 2, 2)
plt.imshow(LBL.squeeze()[w_min: w_min + inspect_size, h_min: h_min + inspect_size],
           cmap=lpisclass_cmap, norm=lpisclass_norm)
plt.xticks([])
plt.yticks([])
ax.set_aspect('auto')
plt.title('Prediction', fontsize=20)

ax = plt.subplot(2, 2, 3)
mask = LBL.squeeze() != labels.squeeze()
plt.imshow(mask[w_min: w_min + inspect_size, h_min: h_min + inspect_size], cmap=cm)
plt.xticks([])
plt.yticks([]);
ax.set_aspect('auto')
plt.title('Difference', fontsize=20)

ax = plt.subplot(2, 2, 4)
image = np.clip(eopatch.data['FEATURES'][8][..., [2, 1, 0]] * 3.5, 0, 1)
plt.imshow(image[w_min: w_min + inspect_size, h_min: h_min + inspect_size])
plt.xticks([])
plt.yticks([]);
ax.set_aspect('auto')
plt.title('True Color', fontsize=20)

fig.subplots_adjust(wspace=0.1, hspace=0.1)

del eopatch

As you can probably see in the randomly chosen section of the AOI there are certain patterns of misclassified pixels:

  • There are complete fields mistaken as another crop group. In this case the algorithm got confused because of similar spectral characteristics. You already got an overview of the frequency and combination of those incidents in the evaluation part above.
  • Misclassified single pixels are usually located at the border of the respective fields. Here the "mixed-pixel-problem" impacts the prediction results. For the modeling these pixels were excluded, as they may include spectral reflectance values of different vegetation types and thereby confuse the algorithm.

Next steps

Now, after your first successful classification you are hooked? But the region around Wels in Austria was surprisingly not your actual AOI or you want to try other vegetation groupings? Then here are some suggestions on how you could proceed:

  • Customize configurations

    The notebook offers various possibilities to change parameters and evaluate their effects. Simply enter the configuration section in the beginning and modify e.g. cloudcover thresholds or your sampling strategy.

  • Change the AOI within Austria

    This would be the simplest case to apply. You just have to place a Shapefile or Geojson of your own AOI in the location of the "Area_AOI.geojson" from the example. The size and shape of the included polygon are irrelevant.

  • Try alternative crop groupings

    In order to regroup the LPIS classes you need to have a closer look at the two CSV files in the GeneralData folder.

    • at_lpis_2018_crop_to_group_mapping_basic.csv: Here you can assign LPIS classes to different crop groups.\ _CROPID represents the respective LPIS class\ _GROUP1 represents the respective group you want a class in
    • crop_group_1_definition_basic.csv: Here you can combine or separate individual crop groups by assigning the respective ID.\ _GROUP1 again represents the groups\ _GROUP_1ID represents the respective numeric ID
  • Apply the notebook to another country

    Another country means different AOI plus different LPIS classes.

    • The first requires no additional effort. Change your AOI file and run the processes. EO data is downloaded and processed exactly as in the example.
    • But when it comes to the ground truth data, this is were things get tricky as you additionally need to customize the CSV grouping files for your specific country
In [ ]: